Displaying a long sequence of images in a short amount of time

ABSTRACT

A method and apparatus is disclosed herein for displaying a long sequence of images in a short amount of time. In one embodiment, the method comprises selecting page images for display from a plurality of page images corresponding to an image-based document, the plurality of page images including merged page images and each merged page image having salient content from multiple successive page images of the image-based document merged into one image; and serially displaying the selected page images on a display.

RELATED APPLICATIONS

This application is related to co-pending U.S. patent application Ser.No. 11,346,493, filed on Feb. 1, 2006, entitled “Compensating forCognitive Load in Jumping Back,” assigned to the corporate assignee ofthe present invention; U.S. patent application Ser. No. 11,346,856,filed on Feb. 1, 2006, entitled “Enhancing Accuracy of Jumping byIncorporating Interestingness Estimates,” assigned to the corporateassignee of the present invention; and U.S. patent application Ser. No.11/346,498, filed on Feb. 1, 2006, entitled “Avoiding Disorientationunder Discontinuous Navigation in an Image Flipping System,” assigned tothe corporate assignee of the present invention.

FIELD OF THE INVENTION

The present invention relates to the field of displaying images andimage processing; more particularly, the present invention relates todisplaying sequences of images quickly while preserving recognizability.

BACKGROUND OF THE INVENTION

Physically paging through a large paper document is a process well-knownto individuals. One generally twists the edge of the document, allowingthe thumb to let the pages freely open in sequence. Depending on thetwisting and thumb force applied, the pages either open slowly orquickly. One can quickly move to any spot in the document this way, andsee each page in order at whatever speed is desired. There are no limitsto the speed used, other than one's own ability to process information.If someone has made an annotation on a page of that paper document, inan ink of contrasting color, the annotations are often easy to see andfind even when casually flipping through the document at very highspeed.

In the electronic arena, the electronic analogue to flipping pages of adocument is scrolling of documents using some type of scroll bar, whichis well known in the art. For example, one usually acquires control of athumb bar using a pointing device (e.g., a mouse). When the thumb bar ismoved, pages from the document are displayed by choosing the page thatcorrelates to the current thumb bar position and displaying it. Once thedisplay of one page is complete, the process is repeated. If the thumbbar is moved slowly enough, then every page can be seen. Typically,though, if the thumb bar is traversed quickly, only a small subset ofthe pages is displayed. There are alternatives to these electronicsystems. Some allow the user to view an array of page thumbnails. Suchsystems work well for a small number of pages, but scale poorly forlarge documents.

Clearly, for navigating by visual appearance and recognition, thecurrent electronic systems are vastly inferior to the physical process.One can easily miss important pages, and one gains little sense of thedocument's contents or structure during the navigation process.

Of course, there are many problems that prevent electronic systems fromemulating the experience of paper. One key problem is “display refreshrate.” The hardware display system of an electronic device repaints itsimage at some fixed rate. A typical computer display can only refreshabout 60-80 times per second. This constitutes an upper limit on howfrequently the image on a display device can be changed.

Furthermore, in order to render any image on a display device, some(usually software) component of the system must arrange to place theimage in the device's “display buffer.” If the image must be retrievedfrom a disk or composed on the fly from some symbolic format, it maytake longer to produce and place in the display buffer than the refreshrate of the display. This “system frame rate” may impose another, morestringent, limit on the speed at which a display device can show aseries of images.

Thus, both the display rate and the system frame rate seem to imposelimits on how rapidly a user can scan through a document without“skipping” some of its pages. This is, indeed, how a DVD player or VCRplays faster than its standard speed, by skipping many frames. Butskipping pages in a document a user is scanning through is a very badidea, since one is likely to miss the feature for which the search isbeing performed.

The VCR, DVR, or DVD system, or, somewhat more remotely, an audio tapeor CD audio system, allows performing a fast forward operation. Mostsuch systems simply stop where they are when the user tells them tostop. There are Tivo systems that do make an effort to jump back to thepoint at which the user wanted to stop. Since these systems allowfast-forwarding at different speeds, they do jump back different amountsdepending on the speed at which they had been running forward.

There are systems that perform a fast forward operation thatautomatically stops. An example of such a system is a television programplayback system. In a television program playback system, when to stopscanning forward to skip commercial advertisements is controlled bychanges in chroma balance, which signal the end of a commercial and theresumption of the desired program material.

Image-based document analysis systems perform similar functions. In thisfield, systems analyze images to find patterns and try to inferstructure. Such inferred structure can then be used to aid navigation orto help users find images or documents of interest. Feature-orientedsystems exist that help users recognize previously seen documents orimages by refining search.

The problem of aiding navigation by providing the user of a page- orimage-oriented system some kind of information to help orient theindividual as to where the current focus is within the surroundingcontext is not new. Indeed much work has been done on such “focus andcontext” supporting systems within the Human-Computer Interactionresearch and development communities. But such systems perform thisfunction by displaying some kind of overview within which the user canunderstand the context of the focused page or image.

Information visualization systems discover similarities in underlyingdata and try to present them visually in such a way that the user(rather than the system) can distinguish patterns or interestingphenomena.

SUMMARY OF THE INVENTION

A method and apparatus is disclosed herein for displaying a longsequence of images in a short amount of time. In one embodiment, themethod comprises selecting page images for display from a plurality ofpage images corresponding to an image-based document, the plurality ofpage images including merged page images and each merged page imagehaving salient content from multiple successive page images of theimage-based document merged into one image; and serially displaying theselected page images on a display.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention will be understood more fully from the detaileddescription given below and from the accompanying drawings of variousembodiments of the invention, which, however, should not be taken tolimit the invention to the specific embodiments, but are for explanationand understanding only.

FIG. 1 is a flow diagram of one embodiment of a process for displayingimages;

FIG. 2 illustrates the identification of salient content of a page;

FIG. 3 is a flow diagram of one embodiment of a process for performingsegmentation to identify the salient content of a page;

FIG. 4 illustrates an example of a histogram;

FIG. 5 illustrates the merging process that combines the salientfeatures of multiple pages into a single merged page;

FIG. 6 is a flow diagram of one embodiment of a process for combiningthe salient features of multiple pages into a single merged image;

FIG. 7 is a flow diagram of one embodiment of a speed control process;

FIG. 8 is a flow diagram for the control process for the page refreshtimer that refreshes the pages shown on the screen;

FIG. 9 is a flow diagram of one embodiment of the process for jumpingback;

FIG. 10 is one embodiment of a system to perform the segmentation andmerging system processes;

FIG. 11 illustrates the run-time application system portion fordisplaying the page images;

FIG. 12 is a flow diagram of another embodiment of a process fordisplaying images;

FIG. 13 is a flow diagram of yet another embodiment of a process fordisplaying images;

FIG. 14 is a flow diagram of still another embodiment of a process fordisplaying images;

FIG. 15 is a flow diagram of one embodiment of a process for extractingand scoring features;

FIG. 16 is a flow diagram of one embodiment of a process for computing afeature value;

FIG. 17 illustrates a document in which most pages have no pictures anda few have large pictures;

FIG. 18 is a flow diagram of one embodiment of a process for creatingdistribution tables for each feature;

FIGS. 19A and 19B are flow diagrams of one embodiment of a process forcreating percentile rankings for each score for each feature;

FIG. 20 is a flow diagram of one embodiment of a process for calculatingthe “mode,” or most popular score, for each feature;

FIG. 21 is a flow diagram of one embodiment of a process for determiningeach page's importance;

FIG. 22 is a flow diagram of one embodiment of a normalization process;and

FIG. 23 is a block diagram of an exemplary computer system that mayperform one or more of the operations.

DETAILED DESCRIPTION OF THE PRESENT INVENTION

A method and system for viewing, exchanging and manipulating electronicdocuments is described. In one embodiment, a display system can rapidlydisplay the contents of a large document. The speed at which the pagesare displayed may approach a thousand per second. In one embodiment,this is accomplished by combining multiple pages into a single pageimage. For example, the system may combine, or merge, a group of 16pages into a single image. The merged image has preserved distinctivecontent of the pages, along with the general layout of the pages thatwere merged. The distinctive, or salient, content includes recognizablefeatures. Easily recognizable features include, for example, pictures,charts, tables, chapter headings, section boundaries, index tabs,hand-drawn annotations, etc. For purposes herein, these salient (e.g.,prominent, eye-catching, easily recognized) features will be referred toherein as “document landmarks.” These salient contents are noticeable toa user when scanning quickly through multiple groups of pages. This maybe facilitated by displaying pages at a smooth and predictable rate. Inone embodiment, in such a system each and every page will be madevisible during the page navigation process.

Page images may be annotated by a user when displayed. In oneembodiment, the method of annotation allows marks to be easily foundduring rapid navigation.

In the following description, numerous details are set forth to providea more thorough explanation of the present invention. It will beapparent, however, to one skilled in the art, that the present inventionmay be practiced without these specific details. In other instances,well-known structures and devices are shown in block diagram form,rather than in detail, in order to avoid obscuring the presentinvention.

Some portions of the detailed descriptions which follow are presented interms of algorithms and symbolic representations of operations on databits within a computer memory. These algorithmic descriptions andrepresentations are the means used by those skilled in the dataprocessing arts to most effectively convey the substance of their workto others skilled in the art. An algorithm is here, and generally,conceived to be a self-consistent sequence of steps leading to a desiredresult. The steps are those requiring physical manipulations of physicalquantities. Usually, though not necessarily, these quantities take theform of electrical or magnetic signals capable of being stored,transferred, combined, compared, and otherwise manipulated. It hasproven convenient at times, principally for reasons of common usage, torefer to these signals as bits, values, elements, symbols, characters,terms, numbers, or the like.

It should be borne in mind, however, that all of these and similar termsare to be associated with the appropriate physical quantities and aremerely convenient labels applied to these quantities. Unlessspecifically stated otherwise as apparent from the following discussion,it is appreciated that throughout the description, discussions utilizingterms such as “processing” or “computing” or “calculating” or“determining” or “displaying” or the like, refer to the action andprocesses of a computer system, or similar electronic computing device,that manipulates and transforms data represented as physical(electronic) quantities within the computer system's registers andmemories into other data similarly represented as physical quantitieswithin the computer system memories or registers or other suchinformation storage, transmission or display devices.

The present invention also relates to apparatus for performing theoperations herein. This apparatus may be specially constructed for therequired purposes, or it may comprise a general purpose computerselectively activated or reconfigured by a computer program stored inthe computer. Such a computer program may be stored in a computerreadable storage medium, such as, but not limited to, any type of diskincluding floppy disks, optical disks, CD-ROMs, and magnetic-opticaldisks, read-only memories (ROMs), random access memories (RAMs), EPROMs,EEPROMs, magnetic or optical cards, or any type of media suitable forstoring electronic instructions, and each coupled to a computer systembus.

The algorithms and displays presented herein are not inherently relatedto any particular computer or other apparatus. Various general purposesystems may be used with programs in accordance with the teachingsherein, or it may prove convenient to construct more specializedapparatus to perform the required method steps. The required structurefor a variety of these systems will appear from the description below.In addition, the present invention is not described with reference toany particular programming language. It will be appreciated that avariety of programming languages may be used to implement the teachingsof the invention as described herein.

A machine-readable medium includes any mechanism for storing ortransmitting information in a form readable by a machine (e.g., acomputer). For example, a machine-readable medium includes read onlymemory (“ROM”); random access memory (“RAM”); magnetic disk storagemedia; optical storage media; flash memory devices; electrical, optical,acoustical or other form of propagated signals (e.g., carrier waves,infrared signals, digital signals, etc.); etc.

Overview of Merged Page Display

A system for displaying rapidly the pages of documents in such a way asto provide a paper-like experience is described. The system performssegmentation on pages of the document to identify salient, landmarkfeatures. These features from multiple pages are merged onto single pageimages. The system displays these “merged page images” that contain thelandmarks and only some of the ordinary text of several consecutivepages.

In one embodiment, there are multiple sets of merged page images. Forexample, one set may be where each page image includes merged salientcontent from two pages. Another set includes merged pages images withmerged salient content from four pages, and another set for merging 8,and another set for merging 16, etc. Thus, in one embodiment, mergedpage images are enhanced to emphasize document landmarks andde-emphasize ordinary text.

Displaying such merged images enables us to overcoming refresh rate andframe rate limitations. To illustrate, if the combined hardware/softwaresystem is capable of rendering and displaying thirty page-sized imagesper second, the user is able to speed through 120 pages per second bycomposing thirty merged images, each of which contains the salientcontent of four consecutive pages. In one embodiment, the speed in whichthe page images are being displayed controls which set is displayed. Asspeed of traversing the images increases, the page images beingdisplayed represent a greater number of merged pages.

In one embodiment, hand-drawn annotations are combined with the mergedimages as they are displayed. Alternatively, the annotations are mergedwith the identified salient content features to be part of the mergedimages. In either case, such annotations are displayed.

In one embodiment, the user controls the speed and direction ofnavigation. This may be through a user input device such as, forexample, a shuttlewheel.

When the user stops in the middle of rapid scanning, the system attemptsto “jump back” to the page that it concludes as the location the userintended to stop (e.g., the image that probably caught the user'sattention).

FIG. 1 is a flow diagram of one embodiment of a process for displayingimages. The process is performed by processing logic that may comprisehardware (e.g., circuitry, dedicated logic, etc.), software (such as isrun on a general purpose computer system or a dedicated machine), or acombination of both.

Referring to FIG. 1, the process begins by identifying salient contenton pages of the image-based document (processing block 101). In oneembodiment, processing logic identifies the salient content by computinga histogram of each page of the image-based document and findingfeatures based on information from the histogram.

After the salient content on the pages of the image-based document havebeen identified, processing logic creates page images by mergingidentified salient content of multiple pages of the image-based documentinto single pages (processing block 102).

Once the merged pages have been created, processing logic selects pageimages for display from multiple page images corresponding to animage-based document (processing block 103). The multiple page imagesinclude merged page images, and each merged page image has salientcontent from multiple successive page images of the image-based documentmerged into one image.

In one embodiment, selecting the page images for display is performed byselecting a subset of the page images based on control information(e.g., a user input) indicative of a rate at which the selected pageimages are to be serially displayed. In one embodiment, the page imagesinclude a first set of merged page images having salient content from anumber of page images of the image-based document merged into one imageand a second set of merged page images having salient content from adifferent number of page images of the image-based document merged intoone image. Furthermore, in one embodiment, processing logic selects thepage images for display by selecting the first set of merged page imagesif the speed, or rate, at which the document is to be displayed is atone rate and selecting the second set of merged page images if the rateat which the document is to be displayed is at a different rate.

After selecting the page images, processing logic serially displays theselected page images on a display (processing logic 104). Seriallydisplaying selected merged page images enables the document to bescanned at a speed exceeding a refresh rate of the display. The selectedpage images are displayed in an order, in either a forward or reverse(backward) direction.

While serially displaying the selected page images, processing logic mayjump back to one or more selected page images based on the input control(processing logic 105). Processing logic may jump back to a page imagebased on an interestingness measure associated with the page image.

Note that the following will be described in terms of pages of adocument. However, the page images need not come from a document. Forexample, the pages may be images of a number of files or a stack ofdocuments. Those techniques described herein may be used to perform asearch of a number of files, such as searching a hard drive, forspecific content.

Page Segmentation

To facilitate the rapid navigation, merged page images are created withthe salient content, or landmarks. Initially, prior to merged page imagecreation, the system identifies document landmarks. In one embodiment,this is done in real time when the document is being displayed. Inanother embodiment, identification of landmarks occurs off-line beforethe document is being displayed. The segmentation process is performedfor every document that is to be displayed rapidly. Thus, efficienttechniques for such segmentation, as well as a good characterization ofwhat constitutes a landmark in a given document, are valuable.

FIG. 2 illustrates the identification of salient content of a page.Referring to FIG. 2, a document 201 is shown having an image. As aresult of the segmentation process, shown in document 202, the salientcontent of the page has been identified as title 210, image 211 andheading 212.

There are a number of segmentation techniques that may be used. In oneembodiment, segmentation is based on a horizontal histogram of eachpage. FIG. 3 is a flow diagram of one embodiment of a process forperforming segmentation to identify the salient content of a page. Inone embodiment, this process is performed on each page of a document andthe salient content identified may be subsequently included in mergedpage images. The process is performed by processing logic, which maycomprise hardware (e.g., circuitry, dedicated logic, etc.), software(such as is run on a general purpose computer system or a dedicatedmachine), or a combination of both.

Referring to FIG. 3, the process begins by processing logic computing ahistogram for the page (processing block 301). In one embodiment, thehistogram is a horizontal histogram that may be considered a tally ofall the non-background pixels in each scan line of a page image. This isessentially image shifting of all the black dots on a white backgroundpage all the way to the left. One such histogram is shown in FIG. 4.Referring to FIG. 4, a bar graph for each scan line is used. The barsstart at the left and extend towards the right. In one embodiment, thebars are only one pixel high but most would be fairly close to thelength of their neighbors above and below. However, the barsrepresenting those scan lines that fell between lines of text would havelength zero.

After computing the histogram, processing logic computes the heights ofcontiguous blocks of non-zero-length histogram bars (usually the totalfont height for line of the text on the page) and the separationsbetween such lines of text (processing block 302). After computing thelineheights and separations, processing logic finds headers and footers(processing block 303), finds columns (processing block 304), findsblank areas (processing block 305), finds tables of data (processingblock 306), finds graphical images (processing block 307), and findstext areas (processing block 308). Then processing logic also analyzestext for headings, boldness, indentation, justification, line heights,line separation, centering, etc. (processing block 309).

In other words, by examining the histogram, distinctive “signatures” ofpictures and other graphics and tables of data may be identified. Forexample, looking at the page as a whole and during the typical breadthof connected areas (bands without any blank area), paragraphs in theplain text may be identified. Furthermore, knowing that the histogramcorresponds to the top of a page, the absence of a bar may indicate thatthe first bar at the top of histogram refers to some type of header andthe remainder of the bars correspond to plain text. To identify chapterand section headings, bold lettering and centered text is detected. Thisinvolves exploring the pixels on each line of text. If a line iscentered or is mostly bold, in one embodiment, it is considered aheading. The comparisons may be made by looking at the width of everytransition between black and white and comparing it to the typical oneon the page.

Note that other segmentation routines may be utilized. These includesuper efficient or parallelized (retinal) algorithms.

Document Representation

In one embodiment, merged pages are computed based on a set ofconsecutive pages. There are several ways to represent the results ofpage segmentation that expedite the display of merged page images. Inone embodiment, landmarks identified from page segmentation areincorporated into page images, each of which represents multiple pagesmerged together. These landmarks become enhanced on the combined pageimages.

In one embodiment, merged images are pre-computed and stored. Thisincludes all the merged images the system is likely to require for anygiven document. This probably imposes the least demands on the system atdisplay time.

In one embodiment, merged images may be the result of merging two, four,eight, etc. pages up to a value (e.g., a power of two) such that thereare about 12 merged images at that level. Thus, in representing a 500page book, images that represent the merging of 32 pages each arecomputed (as well as merges of 2, 4, 8, and 16 pages each). These setsof merged images having images representing a different number of mergedpages are used by the system based on how slow or fast a user isnavigating through a document. If a user is navigating slowly, the sethaving images representing two pages may be used, while if a user isnavigating rapidly, the set having images representing 16 pages may beused. Depending on the size of the document, merged images may becomputed for as many as 64 consecutive pages.

In one embodiment, the process of creating merged images includesinitially creating a blank image, taking all the landmark elementsdiscovered during segmentation and painting them onto the blank image.Thereafter, for all the areas that are still blank, yet have some texton any of the pages that are being merged, the system fills those areaswith a very washed out or faded version of text that is shown in one ofthe pages. This makes the landmarks appear enhanced.

FIG. 5 illustrates the merging process that combines the salientfeatures of multiple pages into a single merged page. Referring to FIG.5, page 501 has been segmented and the identified salient content ofpage 501 includes heading 502 and section title 503. Similarly, page 510has been segmented and the identified salient content includes heading511, image 512 and section title 513. The salient features of page 501and page 510 are combined into page 520. Although not visible on page520, heading 511 of page 510 it is somewhat obscured by heading 502 ofpage 501.

One complication in handling of landmarks from multiple pages islandmarks that overlap. In one embodiment, the images of overlappingsalient content are painted in the order of decreasing area. In thiscase, the bigger images do not occlude the smaller ones. These imagesmay also be painted with decreasing “alpha values” (i.e., they arepainted with increasing transparency).

FIG. 6 is a flow diagram of one embodiment of a process for combiningthe salient features of multiple pages into a single merged image. Theprocess may be performed by processing logic which may comprise hardware(e.g., circuitry, dedicated logic, etc.), software (such as is run on ageneral purpose computer system or a dedicated machine), or acombination of both.

Referring to FIG. 6, the process begins by processing logic creating ablank image (processing block 601). Next, processing logic locates setsof overlapping landmarks in the multiple pages (processing block 602).(Note that a “set” may consist of a single landmark.) Processing logicthen determines whether there are any overlapping sets left to process(processing block 603). If so, the process then goes on to process theone or more sets of overlapping landmarks remaining.

More specifically, processing logic selects a set and orders thelandmarks in the set by decreasing size (processing block 604). Thenprocessing logic selects the next element from the ordered set andpaints the image with increasing transparency, such that those landmarkscovering one or more other landmarks are transparent enough to see someportion of the landmarks “beneath” them. (processing block 605).Thereafter, processing transitions to processing block 606 whereprocessing logic determines whether there are any elements remaining inthe current set of overlapping landmarks. If so, processing transitionsto processing block 605 where the next element is selected and paintedon the image. If not, processing logic transitions to processing block603 where processing logic again tests whether there is any set ofoverlapping landmarks that remain. If not, processing logic transitionsto processing block 607.

At processing block 607, processing logic collects or identifies theunpainted areas of the image. Processing logic tests whether there areany unpainted areas that remain (processing block 608). If not, theprocess stops. If there are unpainted areas that remain, processinglogic iterates through them in an arbitrary order (processing block609). For each one, processing tests whether there is text on any of themultiple pages being merged that is in that unpainted area (processingblock 610). If not, processing logic transitions to processing 608 tocontinue the iteration. If there is text on any of the multiple pagesthat are being merged into a single image, processing logic selects oneof those pages arbitrarily and paints the unpainted area with a fadedversion of the text taken from that page (processing block 611), andthen processing transitions to processing block 608.

Note, in one embodiment, each of the sets of merged page images iscreated from a fresh, blank page. For example, when creating a mergepage with salient features from four pages, the system does not combinetwo merge pages each of which contains salient features from two pages.Instead, the system combines the salient features of all four pages ontoone new page image.

In one embodiment, the merged images are saved in a defined filehierarchy scheme so that the application responsible for displaying theimages is able to locate them.

In order to reduce memory requirements and to speed reading images fromthe disk when necessary, relatively small thumbnails of merged imagesmay be used. Such thumbnails can be stored at a reduced size and scaledup to full page images when they are displayed as the user scans throughthe document. Because the images are being displayed so quickly when theuser is flipping through pages, the loss of quality from such scaling islikely tolerable.

Another option is to use an image representation like the JPEG2000 imagerepresentation scheme to store wavelet and delta information for mergedimages. Using this representation, the merged images may be built on thefly. This might reduce access to the file system at the expense ofprocessing power.

In another embodiment, the merged images are created dynamically on thefly, including the identification of landmarks.

Control Devices

There are a number of ways of controlling the speed and direction ofpage scanning or navigating through the sequence of images. In oneembodiment, the user knows at any time, without taking his eyes from thepage image area of the display, whether scanning is going forwards orbackwards. Tactile feedback from the control device is very useful forthis purpose. In one embodiment, the user is also able to stop scanninginstantly and reliably. In one embodiment, the user is able easily toboth sense (ideally both visually and tactilely) and control the rate atwhich pages are being scanned.

In one embodiment, the user has a high degree of control over the speedof page flipping. This is especially important when the user is near thegoal of their search in rapidly navigating an image-based document. Thatis, as the user nears the goal or page they have just been viewing, inone embodiment, the page flipping that is displayed slows down.

There are two ways to speed up the traversal of a document. First, thetime of traversal for a document may be faster by showing more pageimages more quickly or by showing merged images that represent a greaternumber of pages. Alternatively, however, if the pages are dense withlandmarks, it may be better to show fewer of them at the same time. Inone embodiment, a ShuttleXpress controller may be used as the controldevice. The speed is constantly being monitored as pages are being shownso that when a user lets go of a wheel or as it turns 0, the processautomatically jumps back to the approximate page being viewed when theuser let it go.

In one embodiment, the system uses a speed control algorithm to controlthe speed at which the merged images appear. The speed control algorithmprocesses signals from a speed controller and computes values fordirection, merge level (mergeLevel) and rate. MergeLevel refers toherein the number of pages per merged image, and rate refers to hereinthe number of merged images per second.

FIG. 7 is a flow diagram of one embodiment of a speed control process.The process may be performed by processing logic which may comprisehardware (e.g., circuitry, dedicated logic, etc.), software (such as isrun on a general purpose computer system or a dedicated machine), or acombination of both.

Referring to FIG. 7, the process begins by processing logic inputting acontrol signal referred to in FIG. 7 as CMD. In one embodiment, thecontrol signal is an integer between −7 . . . 7 (processing block 701).The range −7 to 7 represents the fifteen positions of a ShuttleXpresscontroller from Contour Design's of Windham, N. H. The zero position isthe “neutral” stopped position.

After inputting the control signal, processing logic records the timeand the CMD (for use by a jump back process described later below)(processing block 702). Processing logic then tests whether the CMD isequal to zero (processing block 703). If it is, processing transitionsto processing block 704 where the page refresh timer function isdisabled and processing logic performs a jump back operation (processingblock 705). One embodiment of the jump back process is described in moredetail below. The page refresh timer function is described in moredetail below as well. After performing a jump back operation, the speedcontrol process ends.

If the input control signal CMD is not equal to zero, processing logicsets the direction equal to one if the input control signal CMD isgreater than zero or −1 if the input control signal CMD is not greaterthan zero. Next processing logic tests whether the absolute value of theinput control signal CMD is less than or equal to 2.

Note that the number 2 is chosen because when events were −2, −1, 1 and2 are received, individual pages are shown rather than merged pages.Thus, outside of that −2 to 2 range, merged pages are shown startingwith the small merge amounts and slower frame rates at lower values (−3and 3) and increasing both when going towards the higher values (−7 and7). In one embodiment, at the highest values, it is desirable to movethrough an entire document in under 3 seconds.

If the absolute value of CMD is less than or equal to 2, processinglogic sets the mergeLevel equal to 1 (processing block 708) and sets therate equal to two images per second if the absolute value of the inputcontrol signal CMD is equal to 1 or four images per second if absolutevalue of the input control signal CMD is not equal to 1 (processingblock 709), and then transitions to processing block 712. If theabsolute value of the input control signal CMD is not less than or equal2, processing logic sets the mergeLevel as a function of the inputcontrol signal CMD and the size of document (SizeOfDoc), (processingblock 710), sets the rate as a function of the input control signal CMDand the size of the document (SizeOfDoc) (processing block 711), andtransitions to processing block 712. In one embodiment, the size of thedocument (SizeOfDoc) is a constant determined when the document is firstloaded. As mentioned above, the mergeLevel will be higher for largerdocuments so that at the highest values of CMD the entire document canbe traversed in 3 seconds.

At processing block 712, processing logic sets the page refresh timerfunction to refresh at the rate indicated by the rate variable.

FIG. 8 is a flow diagram for the control process for the page refreshtimer that refreshes the pages shown on the screen. The process isinitialized (i.e., the values that control it, such as rate, direction,and mergeLevel are established) by the processing logic described above.Referring to FIG. 8, the page refresh processing entails showing thenext mergeLevel pages in the direction specified by the directionvariable set in the speed control process of FIG. 7 (processing block801).

FIG. 9 is a flow diagram of one embodiment of the process for jumpingback. The process is performed by processing logic which may comprisehardware (e.g., circuitry, dedicated logic, etc.), software (such as isrun on a general purpose computer system or a dedicated machine), or acombination of both.

Referring to FIG. 9, the process begins with processing logic settingthe variable nPPS (number of pages per second) equal to the averagepages per second over the last 0.2 seconds (processing block 901). Next,processing logic sets the variable nMerge (number of the merge level)equal to the merge level as of 0.2 seconds ago (processing block 902).Processing logic also sets the variable minJumpBack, which refers to theminimum amount to jump back, equal to nPPS multiplied by 0.2 minus thevalue of the merge level (nMerge) and sets maxJumpBack, which indicatesthe maximum amount that can be jumped back, equal to nPPS multiplied by0.2 plus the value of the merge level (nMerge) multiplied by 2.Processing also computes the most interesting page that is in the rangebetween the minimal amount to jump back (minJumpBack) and the maximumamount to jump back (maxJumpBack) (processing logic 904) and thenprocessing logic goes back to this page, showing forms of feedback thatwill be described below (processing logic 905).

Alternatively, a scroll bar like interface may be used. In such a case,a strip is used along the side of the page display showing the currentposition within the document. The user may click and hold (the mouse forexample) above or below this to move forward or back through thesequence of pages. The speed of the page flipping depends both on thedistance of the mouse cursor from the current location and the length oftime the user has been holding the mouse. If one were located near thebeginning of the document and clicked and held near the end, the pagescanning rate would gradually ramp up to full speed until approachingthe location near the end where the cursor is being held and thengradually ramping down.

Navigation Feedback

In one embodiment, the system includes multiple forms of feedback toprovide an indication to the user at any time whether one is scanningforwards or backwards. At low scanning rates, the system simulates a“page turn” by showing a vertical bar that appears to wipe across thedisplay, revealing the next page. The wipe is right-to-left if scanningforwards and left-to-right if scanning backwards. The system may includean audible flipping sound that varies according to both direction andspeed.

In another embodiment, the system automatically slows down scanning asit approaches especially important landmarks, like hand-drawnannotations or separator pages (see below). This may be done only ifthey are relatively sparsely distributed. These are referred to hereinas “speed bumps.” Thus, the system assumes that a user will want to seethe landmarks clearly.

Examples of System

In one embodiment, the system for rapidly navigating image baseddocuments includes two parts. These parts may be the same system or mayrepresent two separate systems. Examples of these systems are shown inFIGS. 10 and 11. FIG. 10 illustrates a system to perform thesegmentation and merging system processes, while FIG. 11 illustrates therun-time application system portion for displaying the page images.

Referring to FIG. 10, the page image store 1001 stores image data.Segmentation software 1002, which may be stored in a memory and executedby a processor, obtains images from the page image store 1001 andperforms salient feature identification. The results of the salientfeature identification from segmentation software 1002 is sent tomerging software 1003, which may be stored in a memory and executed by aprocessor. Merging software 1003 combines the salient features ofmultiple pages into a single merged image. The merged images resultingfrom merging software 1003 are stored back in page image store 1001.

FIG. 11 illustrates the run-time application system block diagram.Referring to FIG. 11, long-term page image store 1101 stores mergedimages. Long-term page image store 1101 may be the same as page imagestore 1001 of FIG. 10 or a different memory. There is a furtherrepository of run-time data structures 1102. In one embodiment, datastructures 1102 comprise page images 1110 and annotations 1111. Notethat it is not necessary to include annotations 1111.

Application software 1105, which may be stored in a memory and executedby a processor, includes a display module 1113 which accesses datastructures 1102 and displays page images 1110 and annotations 1111, ifany, on display monitor 1103 according to the speed indicated by speedcontrol module 1112. The speed control module 1112 is responsive tocontrol signals 1106 from controllers 1104 to modify the speed of themerged images being displayed. Controllers 1104 may include one or moreof a mouse, stylus, keyboard, shuttlewheel or any other well-knowncontroller. The control signals from controller 1104 indicate to speedcontrol module 1112 how fast display module 1113 is to access memory anddisplay merged images. In one embodiment, speed control module 1112translates a requested velocity into a control of the display. Displaymodule 1113 reads this information from speed control module 1112 todecide how fast to obtain page images (and annotations). Thus, displaymodule 1113 chooses the sets of merged images to display based on speedcontrol from speed control module 1112. Based on those inputs, displaymodule 1113 shows full resolution individual page images or one or moremerged page images.

In one embodiment, application software 1105 includes annotation module1114 that is responsive to marking signals 1107 from controllers 1104.The user uses one of controllers 1104 to create an annotation on animage. For example, a user with a tablet and stylus may want tounderline, circle, or otherwise annotate a page. Annotation module 1114causes the annotation to be stored with page image 1110 corresponding tothe page upon which the annotation was made. Display module 1113accesses the annotation from data structure 1102 when displaying pageimage 1100 and merges the two together for display. Thus, the storedimages of the merged pages are not modified. In one embodiment, whenevera page or merged image is shown, all the marks associated with that pageor set of pages are shown as well. In an alternative embodiment, thestored or merged images may be modified to include all the marksassociated with the image of the merged pages.

Example Functionality

In one embodiment, the system includes additional functionality. Thismay include chapter tinting or thumb indexing, ephemeral animations, andephemeral separation pages.

In one embodiment, to discern different sections of text, like chapters,the system enhances the document's own images by providing visual cluesto chapter boundaries. For example, the system might perform chaptertinting by adding different pastel backgrounds to the images of thepages of each chapter. Also, the system might perform thumb indexing byartificially providing chapter thumb indexes analogous to those thatsome books provide.

In one embodiment, the system provides a facility for a user to insert“flip-book” style animations (ephemeral animations) that announce theimportance of some page as it is being approached through rapidscanning. The quasi-annotations of which such an animation would consistwould not appear on pages when viewed slowly or statically.

A special-purpose device that embodied the technology described hereinmight host a number of documents. In one embodiment, the system insertsdistinct “separator pages” (ephemeral separator pages) that signal thetransition from one document to another, thus avoiding the need for theuser to switch to another viewing metaphor to choose among documents toscan.

Compensating for Cognitive Load in Jumping Back to One of SeveralRapidly Displayed Image

One of the challenges encountered by the navigation system describedherein is stopping at an appropriate place as page images of a documentare being displayed rapidly. One of the primary uses (along with generalbrowsing) of a rapid image flipping (RIF) system described herein isrecognition-based navigation. A user may flip through a document theuser viewed previously, looking for some particular “landmark” (e.g., acertain picture, graph, etc.). When the user recognizes that landmark,the user signals the system to stop flipping. It is not at allunthinkable that the user may have been quickly going through thedocument at 200 pages per second. By the time the user has recognized alandmark and convinced their hand to signal the system to stop, thelandmark-containing image will no longer be on the display, and thesystem will need to do its best to try to jump back to the page the userrecognized.

A simple approach to solving this problem posits a standard “reactiontime,” multiplies this by the speed of page image display, and jumpsback that far. However, user reaction time in such a system is itself afunction of (among other things) the speed of page image display. Oneembodiment of the present invention provides a method for incorporatingthis realization into the behavior of the page scanning system.

The Naive Solution

The amount of time that it takes to recognize that a user wants to stopand to then propagate that decision through the user's neural pathwaysand into the RIF system is referred to herein as “reaction time.” In oneembodiment, the “reaction time” also includes the hysteresis the controldevice possesses that delays its response to the user's signal. If thesystem can keep track of the image display (flipping) speed, then it canmultiply this by the reaction time to find out approximately which pagecaused the user to signal the system to stop the rapid display of pageimages. In one embodiment, if the page flipping speed over the course ofthe reaction time has not been constant, then the system “integrates”the area under the curve” rather than multiplying.

The use of reaction time in this manner, of course, is only anapproximate solution, since reaction time is neither constant norprecise enough to distinguish among pages flashing by at 200/sec.However, it provides an idea of where to search to discover an“interesting” page that might have a landmark that the user recognized.The use of interestingness is discussed in further detail below.

A Novel Solution

A method for enhancing the accuracy of the estimate of the“operator-intended stopping point” within a rapid discontinuous orcontinuous image presentation system incorporates an estimate of thevariation of the operator's reaction time due to the rate of imagepresentation. Knowing the reaction time, we can multiply it by the imageflipping rate to decide how far to jump back.

In one embodiment, the RIF system takes into account that reaction timeitself is strongly influenced by image flipping rate. Given a fixed setof images, and a fixed “landmark” at which to stop, the reaction timeslows down as the speed of image presentation increases. Thus, oneembodiment of the RIF system takes this phenomenon into account whentrying to jump back to the desired image.

This may be further explained through the use of an example. Assume thatthe user's reaction time when viewing 25 images per second is 0.125seconds. Assume that his reaction time slows by 20% for each doubling ofthe image display rate. (Formally, this is a conservative, assumption,since it is sub-linear.) If the display rate is increased to 200 imagesper second, his reaction time will degrade by a factor of 1.23 orapproximately 1.7. Without factoring in this degradation, the systemwould compensate for the user's reaction time by jumping back 25 pages(when displaying them at 200/sec). But factoring this in, it will jumpback 42 pages. Even if the degradation is as little as 10% per doublingof page speed, the system would need to jump back 33 pages, rather than25. This is a very significant difference, and a system that fails totake it into account will be experienced as severely lacking.

In order to take the degradation of reaction time into account injumping back to the desired image, the following formula for distance asa function of rate and time is the starting point.

d=r*t   (1)

where, d represents the number of images to jump back, r represents thepreceding image display rate, and t represents the reaction time. Let ususe the subscript p to reflect the person whose reaction time is beingrepresented, since different people have different reaction times.

d=r*t _(p)   (2)

To incorporate the fact that reaction time, itself, is a function ofimage flipping speed, the formula is further refined to be:

d=r*t _(p)(r)   (3)

Analyzing the function t_(p), any given individual will have some fixedreaction time under minimal cognitive load. This is then augmented by anadditional increment of time due to cognitive load. This may bere-written as:

t _(p)(r)=t _(p) +l _(p)(r)   (4)

where l_(p)(r) is the load-attributable delay for person p at imageflipping rate r. Substituting this into equation (3), the following isobtained:

d=r*(t _(p) +l(r))   (5)

Rewritten to make more explicit the two terms, only the first of whichwas accounted for in the naive solution, the following is obtained:

d=r*t _(p) +r*l _(p)(r)   (6)

The full nature of the final term of this formula, which represents thedegradation of reaction time as cognitive load increases, can bedetermined by further experimentation. It seems likely to be at leastlinear. For simplicity, assuming that it is, in fact, a linear function,i.e., that, for any given person, the added delay in reaction time is alinear function of the added speed of image display. Since this is aconstant for a given person, it is written as c_(p). The equation isthen written in the interesting form:

d=r*t _(p) +r*c _(p) *r   (7)

or

d=r*t _(p) +c _(p) *r ²   (8)

The jump-back method, then, uses this formula to compute the distance togo back to find the user's desired image.

The user-specific constants are determined during an initial calibrationprocess. Such a calibration process may include the running of atraining phase in which the system receives user feedback and is able toarrive as the constants. Such training is well-known in the art.

FIG. 12 is a flow diagram of another embodiment of a process fordisplaying images. The process is performed by processing logic that maycomprise hardware (e.g., circuitry, dedicated logic, etc.), software(such as is run on a general purpose computer system or a dedicatedmachine), or a combination of both.

Referring to FIG. 12, the process begins by identifying salient contenton pages of the image-based document (processing block 1201). In oneembodiment, processing logic identifies the salient content by computinga histogram of each page of the image-based document and findingfeatures based on information from the histogram.

After the salient content on the pages of the image-based document hasbeen identified, processing logic creates page images by mergingidentified salient content of multiple pages of the image-based documentinto single merged pages (processing block 1202).

Once the merged pages have been created, processing logic selects pageimages for display from multiple page images corresponding to animage-based document (processing block 1203). The multiple page imagesinclude merged page images, and each merged page image has salientcontent from multiple successive page images of the image-based documentmerged into one image.

In one embodiment, selecting the page images for display is performed byselecting a subset of the pages images based on a control information(e.g., a user input) indicative of a rate at which the selected pageimages are to be serially displayed. In one embodiment, the page imagesinclude a first set of merged page images having salient content from anumber of page images of the image-based document merged into one imageand a second set of merged page images having salient content from adifferent number of page images of the image-based document merged intoone image. Furthermore, in one embodiment, processing logic selects thepage images for display by selecting the first set of merged page imagesif the speed, or rate, at which the document is to be displayed is atone rate and selecting the second set of merged page images if the rateat which the document is to be displayed is at a different rate.

After selecting the page images, processing logic serially displays theselected page images on a display (processing logic 1204). Seriallydisplaying the selected page images enables the document to be scannedat a speed exceeding the refresh rate of the display. The selected pageimages are displayed in an order, in either a forward or reverse(backward) direction.

While displaying the sequence of images, processing logic receives auser input to stop sequencing through the page images (processing block1205).

In response to the user input, processing logic determines a location ofa page image in the sequence of page images representing a point atwhich the user intended to stop in the sequence of page images but fordisplay speed of the sequence of images and user reaction time inproviding an indication of the point at which the user intended to stop(processing block 1206). In one embodiment, the location is selectedbased on the image display rate and the reaction time associated withthe user. In another embodiment, the location is selected based on aninterestingness measure of a page at the location.

In one embodiment, processing logic selects the reaction time associatedwith the user from multiple available reaction times based on the imagedisplay rate. Selection of the reaction time for the user may includeselecting a first reaction time when the image display rate is at onerate and selecting another reaction time when the image display rate isat a second rate different from the first rate. The reaction times maybe set as a result of a calibration process. Once the location has beendetermined, processing logic automatically jumps back to the page imageat the location in the sequence of page images (processing block 1207).

Enhancing Accuracy of Jumping Based on Interestingness

Another way to enhance the perceived accuracy of stopping when rapidlyflipping among page images is to incorporate “interestingness estimates”for the various pages. In one embodiment, the distinctiveness of eachpage image is ranked against all other images. Because rapid flippingtechnology often shows more than one page image at a time, some furtherway of discovering where the user intended to stop is needed.

Many documents consist of lots of visually boring pages and a few moreinteresting ones. When scanning rapidly through a document, the user'sattention is caught only by the visually interesting pages. Thus, in oneembodiment, the system uses the visual distinctiveness of a page tofurther refine the estimate of where the user intended to stop.

Assuming that the rapidly-flipping user of the system noticed somehighly salient feature of some image, once the system has discovered,through reaction time computation, the approximate portion of thedocument where the user intended to stop, the system compares all thenearby pages to determine which of them was most visually interesting ordistinctive. This distinctiveness is based on an “interestingness score”computed by the system for each of the pages. The interestingness mayreflect the number and size of pictures, graphs, or tables it contains,the fact that it differs from other pages in the number of columns ishas, the fact that it has a centered heading, etc. In one embodiment,based purely on the reaction time estimate, the system creates a simpleprobability curve among all pages in the range selected based on thereaction time given the likelihood that the user meant to stop on anygiven page. Finally, the system multiplies the interestingness measuresby the probability curve in order to favor interesting pages near thecentroid of the pages based on reaction-time over ones further out.

Thus, the techniques described herein discover appropriate images inresponse to very simple, non-domain-specific feature discovery andweighting to rank images in terms of interestingness or distinctiveness.This could have application in such diverse domains as video playback,information visualization, image-based document analysis, andsurveillance and spy systems.

While these techniques are useful in the context of “jumping back” to auser's intended stopping place when viewing, for example, a document byflipping very rapidly among its pages in a forward or backwarddirection, they are also useful for discontinuous navigation, i.e., injumping from one part of a document to another without showing theintervening pages.

One such situation arises when the user is flipping quickly through thedocument and sees a page she wants to stop on. By the time she hassignaled the system to stop, she may easily be viewing a page 100 pagesremoved from the desired stopping point. In returning to what the systemguesses to be that stopping point, the user doesn't want to see all theintervening pages again. Again, the user may be near the beginning of adocument and want to view something about 80% of the way through. Shemay wish to jump directly to that region.

Computing Interestingness of a Page

To compute the interestingness of a page, the system identifies featuresthat make a page visually distinctive or an especially good place tostop. Such features may include, for example, having chapter headings;having section headings; number and size of pictures or graphs; numberand size of tables; and number of columns.

In one embodiment, another interesting feature is the property of havingan unusually large blank area on the preceding page. In many documents,this signals the start of a new chapter or section. The first page ofthe section may not, in fact, be all that visually distinctive,especially if some neighbor includes a big picture, but the page may yetbe an important stopping place. In one embodiment, the system gives eachof these features a weighting. This is to recognize that some of theseare more important than others, so that a page with a big picture willbe seen as more interesting than one with a section heading. Note thatthe system still wants one with a section heading to stand out fromsurrounding ones with no interesting features at all.

After weighing each feature, the system rates each page in terms of itsvariance from the typical page of the document as a whole (or fromsurrounding pages in the case of a really large document) on each ofthese features (dimensions). By computing the variance, the systemactually allows a visually boring page to stand out within a bunch otherpages all of which have lots of pictures.

Finally, the system computes a total page interestingness score bymultiplying the score on each dimension by that dimension's weight andsumming the products. Note that the interestingness computation isdescribed in more detail below.

Using this total page interestingness score, the system selects the mostinteresting page within a range of pages selected based on reactiontime.

FIG. 13 is a flow diagram of yet another embodiment of a process fordisplaying images. The process is performed by processing logic that maycomprise hardware (e.g., circuitry, dedicated logic, etc.), software(such as is run on a general purpose computer system or a dedicatedmachine), or a combination of both.

Referring to FIG. 13, the process begins by identifying salient contenton pages of the image-based document (processing block 1301). In oneembodiment, processing logic identifies the salient content by computinga histogram of each page of the image-based document and findingfeatures based on information from the histogram.

After the salient content on the pages of the image-based document hasbeen identified, processing logic creates page images by mergingidentified salient content of multiple pages of the image-based documentinto single merged pages (processing block 1302).

Once the merged pages have been created, processing logic selects pageimages for display from multiple page images corresponding to animage-based document (processing block 1303). The multiple page imagesinclude merged page images, and each merged page image has salientcontent from multiple successive page images of the image-based documentmerged into one image.

In one embodiment, selecting the page images for display is performed byselecting a subset of the page images based on a control information(e.g., a user input) indicative of a rate at which the selected pageimages are to be serially displayed. In one embodiment, the page imagesinclude a first set of merged page images having salient content from anumber of page images of the image-based document merged into one imageand a second set of merged page images having salient content from adifferent number of page images of the image-based document merged intoone image. Furthermore, in one embodiment, processing logic selects thepage images for display by selecting the first set of merged page imagesif the speed, or rate, at which the document is to be displayed is atone rate and selecting the second set of merged page images if the rateat which the document is to be displayed is at a different rate.

After selecting the page images, processing logic serially displays theselected page images on a display (processing logic 1304). Seriallydisplaying the selected page images enables the document to be scannedat a speed exceeding the refresh rate of the display. The selected pageimages are displayed in an order, in either a forward or reverse(backward) direction.

While displaying the sequence of images, processing logic receives auser input to stop sequencing through the page images (processing block1305).

In response to the user input, processing logic determines a location ofa page image in the sequence of page images representing a point atwhich the user intended to stop in the sequence of page images but fordisplay speed of the sequence of images and user reaction time inproviding an indication of the point at which the user intended to stop(processing block 1306). In one embodiment, the location is selectedbased on visual distinctiveness of the page image at the location andthe reaction time associated with the user. In another embodiment, thelocation is selected based on image display rate. In yet anotherembodiment, the location is identified based on a probability curve.

In one embodiment, processing logic rates each page in terms of itsvariance from other pages of the image-based document. The visualdistinctiveness may be indicated, in part, by an interestingnessmeasure. In one embodiment, processing logic computes an interestingnessmeasure for each of a plurality of page images, identifies, independentof the distinctiveness measures, the area of successive page imagesusing a probability curve, based on the reaction time, multiplies theinterestingness measures by the probability curve, and selects the pageimage at the location based on this product.

In one embodiment, the interestingness measure is based on theoccurrence of one or more features on each page image. In such a caseprocessing logic rates each page in terms of its variance from otherpages of the image-based document on each of the one or more featuresand computes a score for each page image based on a combination ofscores associated with each of the one or more features. In oneembodiment, processing logic computes the score for each page bymultiplying the score for each feature by a feature weight and summingproducts associated with each of the one or more features multiplied byits associated feature weight.

The features may or may not include one or more of a group consisting ofchapter headings present, section headings present, number of anypictures or graphs present, size of any pictures or graphs present,number of any tables present, size of any tables present, number of anycolumns present, presence of a blank area preceding a page image wherethe blank area is disproportionately larger than blank areas on othernearby pages. In one embodiment, each of the one or more features isweighted.

In one embodiment, processing logic ranks page images in a region thatimmediately surrounds and includes the page image at the location.

Once the location has been determined, processing logic automaticallyjumps back to the page image at the location in the sequence of pageimages (processing block 1307).

Note that this may be extended to any context of discontinuousnavigation within any image-based information space. The user mayindicate a desire to jump from the beginning of a document to a pointabout 80% of the way through it. Because there is no document overviewor set of thumbnails to choose from, the user's indication must beconsidered approximate. In one embodiment, the interestingnesstechniques described herein will be especially useful in determining alikely target.

Indeed such a method of discovering and revealing the most visuallyinteresting images among a set of mostly similar ones might be quiteuseful in document summarization or even as apparently unrelated adomain as information mining from image-based representations. It couldhave application in such diverse domains as video playback; informationvisualization; image-based document analysis; and surveillance andintelligence service (i.e., spy) systems.

Avoiding Disorientation Under Discontinuous Navigation

Using methods for estimation of reaction time and interestingness orimportance of the various pages of a document, the RIF system may beable to make a reasonably good guess as to the user's intended targetpage and jump directly to that guess. If the guess is incorrect, theuser does not know if the system has gone too far or not far enough andmay need to scroll around somewhat aimlessly. Such disorientation ishighly frustrating and makes for a disconcerting user experience.

A method for avoiding this disorientation is also described. Thetechnique provides orientation clues within a rapid image flippingsystem when discontinuously jumping to a new location by simulatingflipping through multiple images as the system approaches the newlocation. This helps a user of a rapid image flipping system understandany failure to jump to a user-intended location by simulating theflipping of multiple pages as the system approaches its guess of thatintended location.

There are many ways to provide an overview of a document, or thelocation of a page amid several surrounding pages, without breaking thepage flipping metaphor of the RIF user experience. A much more “natural”feeling approach is to rapidly flip back, rather than simply jumpingdirectly back, to our target page. This avoids showing manyreduced-sized images.

In one embodiment, the system shows a series of pages between ourcurrent location and the one toward which the system is backtracking.The series of images need not be started with our current location, solong as the system is reasonably sure to show any user-intended locationbetween the current location and the target page (the system's guess asto the user's intended location).

If the user notices the page she actually intended during thisbacktracking feedback, she would know the system has gone too far. Ifnot, she knows the system hasn't gone far enough, and it is easy andnatural to continue navigating in the correct direction.

One problem that arises with this approach is that it may soon come tofeel tedious. If a user has to wait two or three seconds for a series ofpage images to flash by before seeing the system's guess of the user'sintended stopping point, the user will tire of the exercise. Showingseveral pages may be essential, since the jump-back location is a guess,which may go quite wrong. In one embodiment, because merged images ofseveral pages can be shown, the system is able to effectively show,e.g., 31 pages in the time it can show five images by showingsuccessively merges of, for example, 16, 8, 4, 2 and one pages. Thesystem further enhances the naturalness of the feedback by visuallysimulating the appearance of page turning in the appropriate direction.

FIG. 14 is a flow diagram of still another embodiment of a process fordisplaying images. The process is performed by processing logic that maycomprise hardware (e.g., circuitry, dedicated logic, etc.), software(such as is run on a general purpose computer system or a dedicatedmachine), or a combination of both.

Referring to FIG. 14, the process begins by identifying salient contenton pages of the image-based document (processing block 1401). In oneembodiment, processing logic identifies the salient content by computinga histogram of each page of the image-based document and findingfeatures based on information from the histogram.

After the salient content on the pages of the image-based document hasbeen identified, processing logic creates page images by mergingidentified salient content of multiple pages of the image-based documentinto single merged pages (processing block 1402).

Once the merged pages have been created, processing logic selects pageimages for display from multiple page images corresponding to animage-based document (processing block 1403). The multiple page imagesinclude merged page images, and each merged page image has salientcontent from multiple successive page images of the image-based documentmerged into one image.

In one embodiment, selecting the page images for display is performed byselecting a subset of the page images based on a control information(e.g., a user input) indicative of a rate at which the selected pageimages are to be serially displayed. In one embodiment, the page imagesinclude a first set of merged page images having salient content from anumber of page images of the image-based document merged into one imageand a second set of merged page images having salient content from adifferent number of page images of the image-based document merged intoone image. Furthermore, in one embodiment, processing logic selects thepage images for display by selecting the first set of merged page imagesif the speed, or rate, at which the document is to be displayed is atone rate and selecting the second set of merged page images if the rateat which the document is to be displayed is at a different rate.

After selecting the page images, processing logic serially displays theselected page images on a display (processing logic 1404). Seriallydisplaying the selected page images enables the document to be scannedat a speed exceeding the refresh rate of the display. The selected pageimages are displayed in an order, in either a forward or reverse(backward) direction.

While displaying the sequence of images, processing logic receives auser input to stop sequencing through the page images (processing block1405).

In response to the user input, processing logic determines a location ofa page image in the sequence of page images representing a point atwhich the user intended to stop in the sequence of page images(processing block 1406). In one embodiment, the location is selectedbased on visual distinctiveness of the page image at the location and/orthe reaction time associated with the user. In another embodiment, thelocation is selected based on image display rate. In yet anotherembodiment, the location is identified based on a probability curve.

Once the location has been determined, processing logic automaticallyjumps back to the page image at the location in the sequence of pageimages while displaying multiple intervening page images as the locationis being approached (processing block 1407). In one embodiment, themultiple intervening page images are located in the sequence of pageimages closer to the location of the page image than a location in thesequence of page images from which jumping back occurs. In oneembodiment, the multiple intervening page images comprise merged images.In one embodiment, processing logic displays multiple intervening pageimages in a manner visually simulating the appearance of pages beingturned in an appropriate direction.

An Algorithm for Computing the Visual Interestingness of Pages in aDocument

One of the problems encountered by such a rapid page flipping system isstopping in an appropriate place. If the user is flipping rapidlythrough the pages of a document to look for a landmark found on one ofthe pages, that page would be no longer displayed by the time the userrealizes it is time to stop and communicates that to the system. Aspreviously described, there are several techniques that solve thisproblem. Mainly, when jumping to one of several rapidly displayedimages, there are a number of less visually interesting pages (i.e.,visually boring pages) among the few more interesting ones. Becausethese are more interesting, it is likely that the user wished to stop atone of them. Therefore when jumping back, the system stops at one of thepages deemed to be more interesting. Thus, the method incorporates aninterestingness measure into the behavior of the page navigation system.One possible algorithm for determining the interestingness of the pagesof the document is given below. In one embodiment, a set of pages isscored based on their interestingness. The interestingness refers to itssalience or distinctiveness. In one embodiment, the process for rankingpages consists of two operations. The first operation analyzes each ofthe pages to discover the important parts of their visual structure,characterizing the page in terms of various visual features. The secondpart determines how different each page is from the other pages alongthese various dimensions. Finally, since some of the features are moreimportant than others, the system weights each differently indetermining an interestingness score for each page.

In one embodiment, the interestingness score for each page is based onvisual features of the page. Thus, the system identifies features thatmake a page visually distinctive or a particularly good place to stop.In one embodiment, a set of properties that make a page visuallydistinctive include having chapter headings (large fonts, possiblycentered lines), having section heading (large fonts, short lines),including user supplied annotations, having a number and size ofpictures or graphs, having a number and size of tables, having a numberof columns, and/or having lots of blank space on a proceeding page. Notethat having a blank space on a proceeding page may indicate thebeginning of a chapter or section. Thus, even though the first page of asection may not, in fact, be visually distinctive, it may be animportant place to stop, and having lots of blank space may indicatethat a new chapter or section is starting.

Once the features that are indicative of interestingness have beenidentified, software code in the system discovers these within a pageimage. To discover these on a particular page, page parsing orsegmentation is used. There are numerous examples of such page parsingsegmentation that are well known in the art. The system determines theexistence (or lack thereof) of various features on each page and thencomputes a numerical score for the page along each dimension. Thus, foreach page, the system computes the values of a “feature vector”consisting of each of these features. For example, the “picturecoverage” score for a page with nothing but text might be zero, whilethat for a page with a large picture covering half the page might be 50.The score for a “column count” feature for most pages might be one, buta few pages might have two columns and rate a column count of 2. Forexistence of annotations, in one embodiment, the system determines thenumber of separate regions on a page in which the user has left marks.

Thus, using the feature extraction and tallying, the system computes foreach feature for each page a numerical score. This feature vector thencharacterizes the salient visual properties of the page.

The next phase is to perform a distinctiveness determination. In oneembodiment, distinctiveness of a page depends not only on its ownfeatures, but on how those features differ from the surrounding pages.To capture that distinctiveness, the system computes a score thatrepresents, for each feature (“dimension”) how unusual the page is alongthis dimension. In one embodiment, this is done in two phases. First,for each dimension, the system computes a “distribution” table orhistogram of the scores of all the pages along this dimension. Forexample, the “graphical coverage” histogram might look something likethe FIG. 17, representing a document in which most pages have nopictures and a few have large pictures.

Using the representation of how the scores are distributed along a givendimension, the system calculates for each score a measure of how unusualit is. In one embodiment, the system computes for each score whatpercentage of all the pages are “further from the mode” than are thegiven score. This may be performed as follows: first, the systemcomputes the mode (the most common score) and then for the score ofinterest, the system determines which side of the mode it falls on andcomputes how many pages have a score even more extreme (in the samedirection) than a score being evaluated. This is similar to an “inversepercentile” score. In order to count it as no more extreme than scoresthat deviate from the mode on the other side of the mode, the score maybe doubled, adding it to the tallies for each score on the other extremeuntil a group of identical scores that would exceed the same side total(if it is was added in) is reached. This results in a computation foreach score of the number of pages having that score (along thisdimension) more extreme than that score. Subtracting this from the totalnumber of pages and dividing the result by the total number of pagesprovides the proportion of pages that are not more extreme than thegiven page. For clarity, this may be multiplied by 100 to give apercentile ranking for each score along this dimension. This will be thedistinctiveness rank, along this dimension, of any page with a score.

The next phase in the process performs a weighting operation. Differentfeatures may be assigned different weights. For example, the presence ofa large picture on a page is probably more salient than is a minorsection heading. In one embodiment, in computing the interestingness ordistinctiveness score for a page, the system doesn't simply sum up theinterestingness scores along all the dimensions; they are weighteddifferently. Table I illustrates a set of weightings over the set offeatures that may be used.

TABLE 1 Feature Weight number of columns 20 presence of pictures 40blankness of previous page 30 headings 5 tables 15 unusually big lines 5annotations 60

The system multiplies the value of the feature vector by the aboveweights and then sums the products in order to determine theinterestingness value of a page as a whole.

In one embodiment, the system produces an interesting value within afixed range. That is, the system normalizes all scores to a range ofvalues. In one embodiment, the range is 0 to 100. In order to normalizethe scores, the system divides all the scores by the highest score andthen multiplies them by the upper value of the desired range, which is100 in one embodiment.

The following series of flow charts describes an algorithm forcalculating the relative “interestingness” of each of a collection ofpages of a document.

The basic scalars and data structures are:

nFeatures—the number of features we extract from the pages, upon whichwe will base our computation.

weights[ ]—a vector of nFeatures integers representing the relativeimportance of the various features.

nPages—the number of pages in the document.

page—a data structure with at least two components.

-   -   features[ ]—an array of nFeatures integers, used to hold feature        “scores” for the page.    -   interestingness—the value we will calculate.

pages[ ]—a vector of nPages page structures.

Scoring Features

When extracting and scoring features, the page is segmented to discoverits structure or the presence of various features upon which aninterestingness computation is based. In one embodiment, this isperformed using a Segment(page) function (not documented herein, butwell-known in the art) that produces whatever data for the page isneeded as input into a function that computes feature scores for thepage (e.g., the ComputeFeatureValue(page, featureI) function). Theoutput of this phase is to populate a page's features vector with the“scores” of that page for each feature.

FIG. 15 is a flow diagram of one embodiment of a process for extractingand scoring features. The process may be performed by processing logicwhich may comprise hardware (e.g., circuitry, dedicated logic, etc.),software (such as is run on a general purpose computer system or adedicated machine), or a combination of both.

Referring to FIG. 15, the process begins by processing logic setting avariable pageI equal to zero (processing block 1501) and setting a pagevariable equal to the value at the pageI location in a vector page[ ] ofpage structures for the number of pages in the document (nPages)(processing block 1502).

Next, processing logic segments the page using a segmentation process.In one embodiment, the segmentation process is performed by aSegment(page) function, which may be one of many well-known segmentationfunctions (processing block 1503).

After segmentation, processing logic creates a page.features vector ofnFeatures integers, where nFeatures refers to the number of featuresthat are extracted on the pages and upon which the computation ofinterestingness will be made (processing block 1504). This is where wewill store the feature score values that will be computed for eachfeature of the page. Also as part of processing block 1504, processinglogic sets the value of the featureI variable equal to zero.

Thereafter, processing logic sets a value at the featureI location inthe page.features vector equal to the result of calling a procedureentitled ComputeFeatureValue(page,featureI) (processing block 1505).This procedure will be described in more detail below. TheComputeFeatureValue procedure maps the segmented page information to ascore for the feature represented by featureI.

After performing the ComputeFeatureValue procedure and storing itsresult, processing logic tests whether the value of the featureIvariable is yet equal to the number of features that are being computedminus one. If it is not, there are additional features to compute, andprocessing logic transitions to processing block 1507 and then toprocessing block 1505 where the score for the next feature is computed.

Once all the features on the page have been scored, processing logicchecks if the pageI variable is equal to the number of pages in thedocument (nPages) minus one. If it is, then the process ends. If it isnot, processing logic increments the value of the pageI variable(processing block 1509) and transitions to processing block 1502 for thenext page in the document.

ComputeFeatureValue(page, featureI)

The purpose of the ComputeFeatureValue function is to map a rawdescription of the presence, absence, or amount of a certain feature ona page to a score for that feature. In one embodiment, this functionrelies on the presence of several other feature specific functions suchas ComputeNColumnsVal(page), ComputeGraphicsVal(page),ComputeBlanknessVal(page), etc. Each of these computes a score for aspecific feature. For example, the “graphics” score for a page withnothing but text might be zero, while for a page with a large picturecovering half the page, the score might be 50. The score for the“nColumns” feature for most pages might be one, but a few pages mighthave two columns and being given a column count score of two. Forexistence of annotations, a determination is made as to the number ofseparate regions of the page in which the user has left marks.

FIG. 16 is a flow diagram of one embodiment of a process for computing afeature value. The process is performed by processing logic that maycomprise hardware e.g., circuitry, dedicated logic, etc.), software(such as is run on a general purpose computer system or a dedicatedmachine), or a combination of both.

Referring to FIG. 16, the process receives inputs of the values of thepage and featureI variables (processing block 1601). Using these inputs,processing logic tests whether the value of the featureI variable equalsthe signature for the columns count feature (NCOLUMNS_FEATURES)(processing block 1602). If it does, processing logic sets the value ofthe val variable equal to the result of a function that computes a scoreindicative of the number of columns for the page(computeNColumnsVal(page)) and then the process outputs the value of theval variable (processing block 1610).

If the value of the featureI variable is not equal to the signature forthe columns count feature, processing logic tests whether its valueindicates the graphics feature (GRAPHICS_FEATURES). If so, processinglogic sets the value of the val variable to the result of performing afunction to compute a score for the graphic feature(ComputeGraphicsVal(page)) (processing block 1605) and the processoutputs the value of the val variable (processing block 1610).

If the value of featureI does not represent the graphics feature,processing logic tests whether it corresponds to a blankness feature(processing block 1606). If so, processing logic sets the value of theval variable to the result of computing a function that computes a scorefor the blankness feature for the page ((ComputeBlanknessVal(page)) andoutputs the value of the val variable (processing block 1310). Ingeneral, processing continues until the special computeXYZVal(page)function has been called corresponding to the given value of featureI,and its result has been returned.

Creating Distribution Tables for Each Feature

In one embodiment, the process of creating distribution tables of eachfunction essentially creates a histogram where the x-axis is the set ofall possible scores for a particular feature and the y-axis is thenumber of pages that exhibit that score. The result of the algorithm isa creation of two vectors: maxVals and distributions. maxVals is avector of nFeatures integers giving for each feature the maximum scorefor that feature found on any page. distribution is a vector ofnFeatures vectors of integers. Each distribution[i] is a vector ofmaxVals[i]+1 integers giving the distribution table for the i'thfeature.

FIG. 18 is a flow diagram of one embodiment of a process for creatingdistribution tables for each feature. The process is performed byprocessing logic which may comprise hardware (e.g., circuitry, dedicatedlogic, etc.), software (such as is run on a general purpose computersystem or a dedicated machine), or a combination of both.

Referring to FIG. 18, the purpose of the top half of the flow chart(i.e., above processing block 1808) is to compute the maximum observedvalue for each feature. The process begins by processing logic creatingthe maxVals vector of nFeatures integers, in which we will store themaximum score for each feature and setting the value of the featureIvariable equal to 0 (processing block 1801). Processing logic also setsthe value of the pageI variable equal to 0 and the featureI location inthe maxVals vector equal to zero (processing block 1802).

Processing logic begins computing the maximum value of each feature bysetting the value of the page variable equal to the value in the pagesvector at the pageI location and setting the value of the pageValvariable equal to the value at the featuresI location in pages.featuresvector (processing block 1803). Processing logic then tests whether thecurrent value of the variable pageVal is greater than the maximum valueencountered so far for the particular feature (maxVals[featureI])(processing block 1804). If so, processing logic sets the maximum valuevector entry for the feature equal to the page value (processing block1805) and transitions to processing block 1806. If not, processing logictransitions directly to processing block 1806.

At processing logic 1806, processing logic tests whether the value ofthe pageI variable is equal to the number of pages in a document minusone. If not, processing logic increases the value of the pageI variableby one (processing block 1807) and transitions to processing block 1803to continue with the next page. If the value of the pageI variable doesequal the number of pages in the document minus one, processing logictransitions to processing block 1808.

The processing logic has now computed the maximum observed value for thegiven feature. At this point, the process proceeds to create the desireddistribution table. At processing block 1808, processing logic createsfor the given feature a vector of maxVal+1 integers and stores it at thefeatureI'th position in the distributions vector. (It is assumed thatall values of this new vector are initialized to zero.) Processing logicalso sets the value of the pageI variable to zero.

The purpose of the next loop is to tally the various scores for thegiven feature and store the results in the newly created distributionsvector. To do so, Processing logic sets the value of page variable equalto the value in the pages vector at the location specified by the valueof the pageI variable and sets the value of pageVal variable equal tothe value in the page.features vector at the location indicated by thevalue of featureI variable. pageVal now holds this page's scores for thegiven feature. This occurrence of the given score is then tallied byincrementing the score's entry in the newly created vector (processingblock 1809).

Thereafter, processing logic tests whether the value of the pageIvariable is equal to the number of pages in the document minus one(processing block 1810). If not, processing logic increments the pageIvariable by one (processing block 1811) and transitions to processingblock 1809 in order to tally the score of the next page for thisfeature. If we have already processed the final page (pageI==nPage−1),processing logic transitions to processing logic 1812 where processinglogic tests whether the value of the featureI variable is equal thenumber of features being considered minus one. If not, processing logicincrements the value of the featureI variable by one (processing block1813) and transitions to processing block 1802 to process the nextfeature. If so, the process ends.

Create Percentile Tables for Each Feature

FIGS. 19A and 19B are flow diagrams of one embodiment of a process forcreating percentile tables for each feature. The process is performed byprocessing logic that may comprise hardware e.g., circuitry, dedicatedlogic, etc.), software (such as is run on a general purpose computersystem or a dedicated machine), or a combination of both.

The process of FIGS. 19A and 19B are likely the central phase in theinterestingness computation. It presupposes the prior execution of theprocess illustrated in FIG. 18, as it uses the maxVals and distributionstables computed thereby. For each feature, the process converts eachobserved score into a “percentile” ranking for that score that isintended to represent how “unusual” or “distinct” is a page thatexhibits that score for that feature. The result of the process is thecreation of one new vector referred to as percentiles[ ]. percentiles[ ]is a vector of nFeatures vectors of integers. Each percentiles[i] is avector of maxVals[i ]+1 integers giving the percentile rankings of allthe observed scores for the i'th feature.

Referring to FIGS. 19A and 19B, the process begins by processing logicsetting the value of the featureI variable equal to 0, representing thefirst of the visual features for which a percentile table is beingcalculated (processing block 1901). Two other important variables needto be described at this point. The mode variable represents the “mostpopular” score for the given feature, and the valI variable ranges overall the observed scores for the feature. The processing logic sets thevalue of the valI variable to zero (the bottom of the range of allscores). The process illustrated in FIGS. 19A and 19B presupposes theexistence of a CalcMode function (illustrated in FIG. 20, and describedbelow), which computes from the distribution vector computed in FIG. 18the “most popular” score for any feature. The processing logic callsthis function for the feature represented by featureI and sets the valueof the mode variable equal to the result (CalcMode(featureI)). It thencreates a new vector of integers big enough to hold all the observedscores for this feature and stores this at the featureI'th location inthe percentiles vector (processing block 1902). This vector is where thepercentile rankings that are being computed for each score for thisfeature will be stored. The central operation in this process is foundin processing block 1919 in which a “percentile value” is stored for agiven feature score into the vector being constructed. This percentilevalue is intended to represent for a given score the proportion ofscores that are closer to the mode than that score. This is the meaningof the percentile variable which it is the task of most of the processto compute.

After initialization, the process begins by computing the percentileranking of the score valI for the given feature. Any given value forvalI will be greater than, less than, or equal to mode. These threecases are treated separately. First, the processing logic tests whetherthe value of the valI variable is greater than the value of the modevariable (processing block 1903).

If the value of valI is not greater than the value of mode, processingtests whether the value of valI is less than the value of mode(processing 1904). If not, i.e. if the two values are equal, processinglogic sets the value of the percentile variable equal to 0 (meaning thatno scores are closer to mode than valI), and processing transitions toprocessing block 1919, where processing logic records this percentileranking (zero). If, on the other hand, the value of valI is less thanthat of mode, processing logic begins to count the number of pages thathave even lower scores for this feature, i.e. that are even further frommode in the same direction. The nOutside variable keeps track of thiscount, and the processing logic initializes it to zero. The lessIvariable ranges across the possible scores less than valI, and it, too,is initialized to zero (processing block 1912). Processing logic thentests whether the value of the lessI variable is (still) less than thevalue of the valI variable (processing block 1913). If so, processinglogic increases the value of the nOutside variable by the number ofpages with a score of lessI for this feature. Then processing logicincrements the value of lessI, and processing returns to processingblock 1913 to determine whether all the scores below valI have beencovered. If so, then processing logic transitions to processing block1915 where it begins to compute an estimate of how many page scores forthis feature are more extreme than valI, but on the other (higher) sideof mode. This tally is stored in the nOtherSide variable, which isinitialized to 0. Another variable ranges over the higher score values(just as lessI did over the lower ones). This variable is referred toherein as moreI, and processing logic initializes it to the highestobserved score, which was stored (during the first half of FIG. 18'sprocessing) at the featureI location in the maxVals vector (processingblock 1915). A goal is to determine how many high page scores are moreextreme than valI. This is accomplished by counting all the highestclumps of scores (the number of pages exhibiting a certain score) untilsome clump of scores would, if counted, push the total beyond the numberof scores that were less than valI, which has been recorded in nOutside.So processing logic tests whether the current value of the nOtherSidevariable plus the size of the next clump (distributions[featureI][moreI])is greater than the value of nOutside (processing block 1916).If not, then processing logic increases nOtherSide by the size of thatclump (distributions[featureI] [moreI]) and decrements moreI in order toconsider the next lower score (processing block 1917). Processing thentransitions back to processing block 1916. When a clump of pages withthe same score would, if added to nOtherSide, cause it to exceednOutside, then processing logic transitions to processing block 1918,which will be discussed later. The processing up to this point(processing blocks 1912 to 1917) has, for the case in which valI is lessthan mode, caused that nOutside and nOtherSide contain the number ofscores more extreme than valI.

Processing blocks 1905 through 1910 accomplish the same result for thecase when valI is greater than mode. These are more or less the same asblocks 1912 through 1917. The moreI variable ranges over those scoresgreater than valI as a total for all those page scores greater than valIis accumulated in nOutside. Then lessI ranges across some of the lowestscores until nOtherSide holds an estimate of the number of low pagescores for this feature that are more extreme than valI. Finally,processing logic transitions to block 1918.

Now, in processing block 1918, the processing logic performs the centralcalculations of this process. The process stores the sum of nOutside andnOtherSide in the nOutside variable. This now represents the totalnumber of pages for which the score for this feature was more extremethan valI. The higher this number, the more typical is the valI scoreand the more boring any page that exhibits it. Since an estimate of theinterestingness of pages is being sought, rather than one of theirboringness, processing logic then calculates the number of pages thathave an equal or less extreme score than valI, namely nPages−nOutside,and stores this value in the variable nInside. Finally, processing logicconverts this into a percentile score by dividing it by the total numberof pages and multiplying by 100, and stores this value in the percentilevariable (processing block 1918).

Finally, in processing block 1919, the percentile ranking for the valIscore for the current feature is stored into percentiles table beingcomputed. (Recall that percentiles[featureI] is a vector of thepercentile rankings of all observed scores for the feature representedby featureI.)

The processing logic then test whether there are any more observedscores for this feature for which it needs to compute the percentileranking by comparing valI to the maximum observed score for this feature(stored at maxVals[feature]). If so, i.e. if valI is not yet equal tomaxVals[featureI]−1, processing logic increments valI and transitions toprocessing block 1903 to calculate the percentile ranking of the scorerepresented by the new value of valI.

If, on the other hand, all observed scores for this feature have beenconsidered, processing transitions to block 1921, which determineswhether there are any more features to consider by comparing featureI tonFeatures−1. If there are more features (i.e. featureI is not yet equalto nFeatures−1), processing logic increments the feature index, featureIand loops back around to block 1902 to begin calculating the mode andthen the percentile rankings for observed scores for the next feature.If there are no more features to consider, processing stops, havingfully populated the percentiles table.

Calculate the Mode Value of a Distribution Table

FIG. 20 is a flow diagram of one embodiment of a process for calculatingthe “mode” value of the distribution of a given feature. The process isperformed by processing logic that may comprise hardware e.g.,circuitry, dedicated logic, etc.), software (such as is run on a generalpurpose computer system or a dedicated machine), or a combination ofboth.

Referring to FIG. 20, the process begins with processing logic receivingthe value of the featureI variable as an input (processing block 2001)representing the feature whose most popular score being computed. Theprocess iterates over the observed scores for this feature and seeswhich has the most pages that exhibit that score. Processing logicbegins by setting the value of the valI variable, which iterates overobserved scores equal to 0, the value of the highSoFar variable equal to0, and the value of the mode variable equal to 0 (processing block2002). Processing logic then sets the value of the val variable equal todistributions[featureI] [valI] which holds the tally of the number ofpages that exhibit a score of valI for this feature (processing block2003).

Processing logic tests whether the value of the val variable is greaterthan the value of the highSoFar variable, which is, of course, keepingtrack of the number of pages that exhibit the most popular scoreencountered so far(processing block 2004). If so, then processing logicsets the value of the mode variable equal to the value of the valIvariable and sets the value of the highSoFar variable equal to the valueof the val variable (processing block 2005). In any case, processingtests whether it has tested all of the observed scores by comparing thevalue of the valI variable to the value stored at the location in themaxVals vector specified by the value of the featureI variable(processing block 2006). If they are not equal, then processing logicincrements the value of the valI variable (processing block 2007) andthen processing transactions back to processing block 2003. If they areequal, the process ends with the output of the value of the modevariable, which represents the most popular score for that feature.

Calculate Each Page's Importance as a Weighted Sum of FeatureDistinctiveness Scores

FIG. 21 is a flow diagram of one embodiment of a process for determiningeach page's importance. The process is performed by processing logicthat may comprise hardware e.g., circuitry, dedicated logic, etc.),software (such as is run on a general purpose computer system or adedicated machine), or a combination of both.

It will be observed that the flow diagram consists of an inner and anouter loop, where the outer loop iterates over pages, while the innerloop iterates over features to compute the interestingness score forthat page.

The process combines, for each page, the distinctiveness of its variousfeature scores to obtain an “interestingness” score for the page as awhole. The process is simple “weighted sum” computation. Weights areassigned to each feature, and these are multiplied by the “percentile”ranking of the page's score for the given feature, with the productsbeing summed. The result of this algorithm is the computation of thepage.interestingness value.

Referring to FIG. 21, processing logic begins by setting the value ofthe pageI variable equal to 0 (processing block 2101) and sets the valueof the page variable equal the value at the pageI location in the pagesvector, the value of the score variable equal to 0, and the value of thefeatureI variable equal to 0 (processing block 2102).

In the main processing block, block 2103, processing logic incrementsthe value of the score variable by the weighted value of the currentfeature. Processing logic sets the value of the featureVal variable tothe value at the featureI location in the page.features vectorrepresenting the score for this feature on this page. Processing logicthen extracts the distinctiveness value of this score from thepercentiles table (perctiles[featureI]) and stores this in thepercentile variable. Processing logic then weights this by multiplyingit by weights[featureI] and stores this product in the featureScorevariable, which it finally adds to the accumulating score variable.

Processing logic then checks whether to continue the inner loop. Ittests whether the value of the featureI variable equals the value of thenFeatures variable minus one (processing block 2104). If not, processinglogic increments the value of featureI variable by one (processing block2105) and processing returns to processing block 2103. If so, thenprocessing logic stores the pages interestingness score by setting thevalue of the page.interestingness variable equal to the value of thescore variable (processing block 2106). Processing logic then checkswhether to continue the outer loop. It tests whether the value of thepageI variable equals the value of the nPages variable minus one(processing block 2107). If not, then processing logic increments thevalue of the pageI variable by one (processing block 2108) andprocessing logic returns to processing block 2101. If so, then theprocess ends.

Normalize All the Page Interestingness Scores

In one embodiment, all of the page.interestingness values are in apredictable range. An algorithm is described below that ensures that allof the page.interestingness values will fall between 0 and 100.

FIG. 22 is a flow diagram of one embodiment of a normalization process.The process is performed by processing logic that may comprise hardwaree.g., circuitry, dedicated logic, etc.), software (such as is run on ageneral purpose computer system or a dedicated machine), or acombination of both.

The process described by FIG. 22 contains two loops. The firstcalculates a maximum page interestingness score which it stores in themax variable. The second arranges that all interestingness scores arescaled to fall in the range between 0 and 100.

Referring to FIG. 22, processing logic begins by setting the value ofthe pageI and max variables equal to 0 (processing block 2201).Processing then sets the value of the page variable equal to the valueat the pageI location in the pages vector and sets the value of thepageScore variable equal to the value of the page.interestingnessvariable (processing block 2202).

Processing logic then tests whether the value of the pageScore variableis greater than the value of the max variable (processing block 2203).If so, then processing logic sets the value of the max variable equal tothe value of the pageScore variable (processing block 2204). Next,processing logic tests whether it has finished iterating through all thepages by testing whether the value of the pageI variable is equal to thevalue of the nPages variable minus one (processing block 2205). If not,processing logic increments the value of the pageI variable by one(processing block 2206) and processing transitions back to processingblock 2202.

If all the pages have been examined for their score (pageI==nPages−1),processing logic proceeds to initialize two variables preparatory to thesecond loop. A correction factor is calculated which, when multiplied bythe page interestingness scores will scale them all to fit within therange 0 to 100, and this is stored in the correctionFactor variable. Inone embodiment, this correction factor is computed by dividing 100 bythe maximum observed interestingness score, which was calculated by thefirst loop and stored in the variable max. The pageI variable is thenset to 0, in order to begin iterating through the pages again(processing block 2207).

In processing block 2208, processing logic scales the current page'soverall interestingness score by the correction factor as follows. Thecurrent page's data structure is extracted from the pages vector andstored into the page variable (page=pages[pageI]). The interestingnessscore is multiplied by the correction factor and the product is storedback into the interestingness field of the data structure(page.interestingness*=correctionFactor).

Processing logic then tests whether it has yet iterated through all thepages performing this scaling operation by testing whetherpageI==nPages−1. If not, it increments the page index variable, pageIand returns to processing block 2208. If so, the process is complete.

An Exemplary Computer System

FIG. 23 is a block diagram of an exemplary computer system that mayperform one or more of the operations described herein. Referring toFIG. 23, the computer system may comprise an exemplary client or servercomputer system. The computer system comprises a communication mechanismor bus 2311 for communicating information, and a processor 2312 coupledwith bus 2311 for processing information. Processor 2312 includes amicroprocessor, but is not limited to a microprocessor, such as, forexample, Pentium™, PowerPC™, Alpha™, etc.

The computer system further comprises a random access memory (RAM), orother dynamic storage device 2304 (referred to as main memory) coupledto bus 2311 for storing information and instructions to be executed byprocessor 2312. Main memory 2304 also may be used for storing temporaryvariables or other intermediate information during execution ofinstructions by processor 2312.

The computer system also comprises a read only memory (ROM) and/or otherstatic storage device 2306 coupled to bus 2311 for storing staticinformation and instructions for processor 2312, and a data storagedevice 2307, such as a magnetic disk or optical disk and itscorresponding disk drive. Data storage device 2307 is coupled to bus2311 for storing information and instructions.

The computer system may further be coupled to a display device 2321,such as a cathode ray tube (CRT) or liquid crystal display (LCD),coupled to bus 2311 for displaying information to a computer user. Analphanumeric input device 2322, including alphanumeric and other keys,may also be coupled to bus 2311 for communicating information andcommand selections to processor 2312. An additional user input device iscursor control 2323, such as a mouse, trackball, trackpad, stylus, orcursor direction keys, coupled to bus 2311 for communicating directioninformation and command selections to processor 2312, and forcontrolling cursor movement on display 2321.

Another device that may be coupled to bus 2311 is hard copy device 2324,which may be used for printing instructions, data, or other informationon a medium such as paper, film, or similar types of media. Furthermore,a sound recording and playback device, such as a speaker and/ormicrophone may optionally be coupled to bus 2311 for audio interfacingwith the computer system. Another device that may be coupled to bus 2311is a wired/wireless communication capability 2325 to communication to aphone or handheld palm device.

Note that any or all of the components of the computer system 2300 andassociated hardware may be used in the present invention. For example,in one embodiment, a joystick or shuttlewheel device for generatingdirectional and intensity control signals is included. However, it canbe appreciated that other configurations of the computer system mayinclude some or all of the devices.

Whereas many alterations and modifications of the present invention willno doubt become apparent to a person of ordinary skill in the art afterhaving read the foregoing description, it is to be understood that anyparticular embodiment shown and described by way of illustration is inno way intended to be considered limiting. Therefore, references todetails of various embodiments are not intended to limit the scope ofthe claims which in themselves recite only those features regarded asessential to the invention.

1. A method comprising: selecting page images for display from aplurality of page images corresponding to an image-based document, theplurality of page images including merged page images and each mergedpage image having salient content from multiple successive page imagesof the image-based document merged into one image; and seriallydisplaying the selected page images on a display.
 2. The method definedin claim 1 wherein selecting the page images for display comprisesselecting a subset of the plurality of pages images based on a controlinformation indicative of a rate at which the selected page images areto be serially displayed.
 3. The method defined in claim 2 whereinserially displaying the selected page images enables the document to bescanned at a speed exceeding a refresh rate of the display.
 4. Themethod defined in claim 1 wherein the plurality of page images includesa first set of merged page images having salient content from a firstnumber of page images of the image-based document merged into one imageand a second set of merged page images having salient content from asecond number of page images of the image-based document merged into oneimage, the first and second numbers being different, and further whereinselecting the page images for display comprises selecting the first setof merged page images if the rate at which the document is to bedisplayed is at a first rate and selecting the second set of merged pageimages if the rate at which the document is to be displayed is at asecond rate.
 5. The method defined in claim 1 further comprising:identifying salient content on pages of the image-based document; andcreating the plurality of page images by merging identified salientcontent of multiple pages of the image based document into singleimages.
 6. The method defined in claim 5 wherein creating the pluralityof pages comprises: creating a first set of merged pages having salientcontent from a first number of page images of the image-based documentmerged into one image; creating a second set of merged pages havingsalient content from a second number of page images of the image-baseddocument merged into one image, the second number being a multiple ofthe first number; creating a third set of merged pages having salientcontent from a third number of page images of the image-based documentmerged into one image, the third number being a multiple of the secondnumber; and wherein selecting the page images comprises selecting thefirst, second or third sets based on a speed control input.
 7. Themethod defined in claim 5 wherein identifying the salient contentcomprises: computing a histogram of each page of the image-baseddocument; and finding features based on information from the histogram.8. The method defined in claim 7 wherein finding the features comprisesone or more of: finding lineheights and separations; finding headers;finding footers; finding columnation; finding blank areas; findingtables; finding graphical images; and finding text areas.
 9. The methoddefined in claim 7 wherein finding the features comprises analyzing textto identify one or more of a group consisting of headings, boldness,justification, centering, line height, line separation and indentation.10. The method defined in claim 7 wherein merging identified salientcontent of multiple pages of the image-based document into single pagescomprises: adding overlapping salient content from two or more pages toa single page image in decreasing size order.
 11. The method defined inclaim 7 wherein merging identified salient content of multiple pages ofthe image-based document into single pages comprises: adding overlappingsalient content from two or more pages to a single page image in anorder with increasing transparency as each one of the overlappingsalient content is added.
 12. The method defined in claim 7 whereinmerging identified salient content of multiple pages of the image-baseddocument into single pages comprises adding text to a single page imageresulting from merging two or more pages, the added text being a versionof original text from one of the two or more pages that has beenmodified to display differently than the original text.
 13. The methoddefined in claim 1 wherein page images are displayed, as a function ofan input control, according to a merge level indicative of a number ofpages of the image-based document per merged image and a rate indicativeof the number of merged page images per second based on the merge level.14. The method defined in claim 13 wherein the page images are displayedin order forward or backward based on a direction indication.
 15. Themethod defined in claim 13 wherein the merge level and the rate are setas a function of the input control and the size of the image-baseddocument.
 16. The method defined in claim 1 wherein serially displayingthe selected page images on the display comprises displaying theselected page images in an order either forward or backward, and furthercomprising jumping back to one or more selected page images based on theinput control.
 17. The method defined in claim 16 wherein jumping backto one or more selected page images comprises jumping back to a pageimage based on an interestingness measure associated with the pageimage.
 18. An article of manufacture having one or more recordable mediastoring instructions thereon which, when executed by a system, causesthe system to perform a method comprising: selecting page images fordisplay from a plurality of page images corresponding to an image-baseddocument, the plurality of page images including merged page images andeach merged page image having salient content from multiple successivepage images of the image-based document merged into one image; andserially displaying the selected page images on a display.
 19. Thearticle of manufacture defined in claim 18 wherein selecting the pageimages for display comprises selecting a subset of the plurality ofpages images based on a control information indicative of a rate atwhich the selected page images are to be serially displayed.
 20. Thearticle of manufacture defined in claim 19 wherein serially displayingthe selected page images enables the document to be scanned at a speedexceeding a refresh rate of the display.
 21. The article of manufacturedefined in claim 18 wherein the plurality of page images includes afirst set of merged page images having salient content from a firstnumber of page images of the image-based document merged into one imageand a second set of merged page images having salient content from asecond number of page images of the image-based document merged into oneimage, the first and second numbers being different, and further whereinselecting the page images for display comprises selecting the first setof merged page images if the rate at which the document is to bedisplayed is at a first rate and selecting the second set of merged pageimages if the rate at which the document is to be displayed is at asecond rate.
 22. The article of manufacture defined in claim 18 whereinthe method further comprises: identifying salient content on pages ofthe image-based document; and creating the plurality of page images bymerging identified salient content of multiple pages of the image baseddocument into single images.
 23. The article of manufacture defined inclaim 22 wherein creating the plurality of pages comprises: creating afirst set of merged pages having salient content from a first number ofpage images of the image-based document merged into one image; creatinga second set of merged pages having salient content from a second numberof page images of the image-based document merged into one image, thesecond number being a multiple of the first number; creating a third setof merged pages having salient content from a third number of pageimages of the image-based document merged into one image, the thirdnumber being a multiple of the second number; and wherein selecting thepage images comprises selecting the first, second or third sets based ona speed control input.
 24. The article of manufacture defined in claim22 wherein identifying the salient content comprises: computing ahistogram of each page of the image-based document; and finding featuresbased on information from the histogram.
 25. The article of manufacturedefined in claim 24 wherein finding the features comprises one or moreof: finding lineheights and separations; finding headers; findingfooters; finding columnate; finding blank areas; finding tables; findinggraphical images; and finding text areas.
 26. The article of manufacturedefined in claim 24 wherein finding the features comprises analyzingtext to identify one or more of a group consisting of headings,boldness, indentation, justification, centering, line heights and lineseparations.
 27. The article of manufacture defined in claim 24 whereinmerging identified salient content of multiple pages of the image-baseddocument into single pages comprises: adding overlapping salient contentfrom two or more pages to a single page image in decreasing size order.28. The article of manufacture defined in claim 24 wherein mergingidentified salient content of multiple pages of the image-based documentinto single pages comprises: adding overlapping salient content from twoor more pages to a single page image in an order with increasingtransparency as each one of the overlapping salient content is added.29. The article of manufacture defined in claim 24 wherein mergingidentified salient content of multiple pages of the image-based documentinto single pages comprises adding text to a single page image resultingfrom merging two or more pages, the added text being a version oforiginal text from one of the two or more pages that has been modifiedto display differently than the original text.
 30. The article ofmanufacture defined in claim 18 wherein page images are displayed, as afunction of an input control, according to a merge level indicative of anumber of pages of the image-based document per merged image and a rateindicative of the number of merged page images per second based on themerge level.
 31. The article of manufacture defined in claim 30 whereinthe page images are displayed in order forward or backward based on adirection indication.
 32. The article of manufacture defined in claim 30wherein the merge level and the rate are set as a function of the inputcontrol and the size of the image-based document.
 33. The article ofmanufacture defined in claim 18 wherein serially displaying the selectedpage images on the display comprises displaying the selected page imagesin an order either forward or backward, and wherein the method furthercomprises jumping back to one or more selected page images based on theinput control.
 34. The article of manufacture defined in claim 33wherein jumping back to one or more selected page images comprisesjumping back to a page image based on an interestingness measureassociated with the page image.